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Abstract: Distributed estimation of Gaussian mixtures has many applications in wireless 
sensor network (WSN), and its energy-efficient solution is still challenging. This paper 
presents a novel diffusion-based EM algorithm for this problem. A diffusion strategy is 
introduced for acquiring the global statistics in EM algorithm in which each sensor node 
only needs to communicate its local statistics to its neighboring nodes at each iteration. 
This improves the existing consensus-based distributed EM algorithm which may need 
much more communication overhead for consensus, especially in large scale networks. 
The robustness and scalability of the proposed approach can be achieved by distributed 
processing in the networks. In addition, we show that the proposed approach can be 
considered as a stochastic approximation method to find the maximum likelihood estimation 
for Gaussian mixtures. Simulation results show the efficiency of this approach. 

Keywords: diffusion; distributed processing; EM algorithm; consensus; wireless sensor 
networks 
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1. Introduction 

Recently, with the advances in micro-electronics and wireless communications, the large scale 
wireless sensor networks (WSNs) have found applications in many domains, such as environment 
monitoring, vehicle tracking, healthcare, etc. These WSNs usually consist of massively distributed 
low-cost, low-power and small size sensor nodes, which have sensing, data precessing and 
communication capabilities [1]. However due to limited sensor and network resources, energy-efficient 
collaborative signal processing algorithms are needed \2-A] . 

Mixture density is a powerful family of probability density to represent some physical quantities. 
The most wildly used mixture is Gaussian mixture model (GMM), which has wide applications in 
WSNs such as energy based multi-source localization and maneuvering target tracking [5,6]. Density 
estimation is the most important step in exploratory data analysis for such models. The well-known 
expectation-maximization (EM) algorithm has been extensively applied for such a purpose [7]. 

The EM algorithm is used for finding maximum likelihood estimates of parameters in probabilistic 
models, where the model depends on unobserved latent variables. The EM algorithm was explained 
and given its name in a classic 1977 paper by Arthur Dempster, Nan Laird, and Donald Rubin [8]. 
They pointed out that the method had been "proposed many times in special circumstances" by earlier 
authors. The EM algorithm is implemented by performing an expectation step (E-step), which computes 
an expectation of the likelihood by including the latent variables as if they were observed, and a 
maximization step (M-step), which computes the maximum likelihood estimates of the parameters by 
maximizing the expected likelihood found on the E-step. The parameters found on the M-step are 
then used to begin another E-step, and the process is repeated until converging to a local maximum 
for the likelihood. 

The EM algorithm has been widely used in many areas, such as econometric, clinical, engineering 
and sociological studies [9,10]. However, most existing EM algorithms in WSNs are designed in a 
centralized way with a fusion center. Distributed algorithms have many advantages on robustness 
and scalability in large scale WSNs over the centralized scheme due to their distributed nature, node 
redundancy, and the lack of dangers when the fusion center fails. Robust, asynchronous and distributed 
algorithms have seen more and more opportunities as the intelligent and autonomous sensors play a more 
important role in the networks, as demonstrated in estimation, detection and control [11-13]. 

The distributed processing framework in WSNs depends on the strategies of cooperation that are 
allowed among the nodes. Two of the strategies used in the literature for distributed processing are 
the incremental strategy and the consensus strategy. In the incremental strategy, information flows in 
a sequential manner from one node to the other. This mode of operation requires a cyclic pattern of 
collaboration among the nodes [14]. This incremental idea is first introduced to distributed estimation 
in adaptive networks [15]. An incremental strategy-based EM algorithm for Gaussian mixtures in WSNs 
is proposed in [16]. In this algorithm, each node computes its local statistics. The algorithm obtains 
the global statistics along a pre-selected path by adding the local statistics of the node, then updates the 
estimate of each node in the reverse order of the path. Although it does not need a fusion center and 
tends to require the least amount of communications, usually there is a long way from the first node to 



Sensors 2011, 11 



6299 



the last node of the pre-selected path in a large scale WSNs that may cause serious reliability problem 
when any node along the path fails. 

In the consensus strategy, distributed estimation algorithms are proposed in [17,18], where the 
consensus is implemented in two time scales: a slower time scale for obtaining measurements and a 
faster time scale for processing iterations. The fundamental objective in consensus-based algorithm is 
to obtain the same estimate for all nodes at each iteration. In [19], another consensus-based distributed 
EM algorithm for Gaussian mixtures is proposed. In this algorithm, a consensus filter (C-step), in which 
the global statistics for each sensor node is obtained by average consensus, is implemented between the 
E-step and the M-step at each iteration. However, in a large scale sensor network, the consensus for 
all the sensor nodes at each iteration can only be achieved with large amount of communications and 
consume a lot of battery power. 

For the incremental strategy-based methods, finding the communication path through all nodes is the 
well-known Hamiltonian circuit problem, which is proved to be NP-complete [20] . The nature of two 
time scales in consensus-based methods limits their applications to WSNs which is resource constrained 
for communications [15]. Recently, a number of novel diffusion adaptation strategies are introduced 
for distributed estimation, detection and filtering in networked systems, including diffusion recursive 
least-squares estimation algorithm [21], diffusion least-mean squares estimation algorithms [22-24], 
diffusion adaption for distributed detection [25], and diffusion Kalman filtering and smoothing algorithm 
for dynamic system [26]. Different from the consensus strategy with two time scales among neighboring 
nodes, diffusion strategy only needs to update the estimate for each node by single time scale, which can 
therefore lead to good performance with less communication overhead. 

In this paper, we propose a diffusion scheme of EM (DEM) algorithm for Gaussian mixtures in 
WSNs. At each iteration, the time-varying communication network is modeled as a random graph. A 
diffusion-step (D-step) is implemented between the E-step and the M-step. In the E-step, sensor nodes 
compute the local statistics by using local observation data and parameters estimated at the last iteration. 
In the D-step, each node exchanges local information only with its current neighbors and updates the 
local statistics with exchanged information. In the M-step, the sensor nodes compute the estimation of 
parameter using the updated local statistics by the D-step at this iteration. Compared with the existing 
distributed EM algorithms, the proposed approach can extensively save communication for each sensor 
node while maintain the estimation performance. Different from the linear estimation methods such as 
the least-squares and the least-mean squares estimation algorithms, each iteration of EM algorithm is a 
nonlinear transform of measurements. The steady-state performance of the proposed DEM algorithm 
can not be analyzed by linear way. Instead, we show that the DEM algorithm can be considered as a 
stochastic approximation method to find the maximum likelihood estimation for Gaussian Mixtures. 

The rest of this paper is organized as follows. In Section 2, the centralized EM algorithm for Gaussian 
mixtures in WSNs is introduced, and two distributed strategies for EM algorithms are reviewed. In 
Section 3, a randomly time- varying network is introduced and the DEM algorithm in WSNs is proposed. 
Section 4 discusses the asymptotical property of the proposed DEM algorithm, and shows that the 
proposed algorithm can be viewed as a stochastic approximation to the maximum likelihood estimator 
according to the Robbins-Monro stochastic approximation theory. Simulation results are presented in 
Section 5 to show the performance of our method. Concluding remarks are made in Section 6. 
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2. EM Algorithm for Gaussian Mixtures 

We consider a network of M sensor nodes, each of which has A^^ data observations {nmn}, 
m — 1,2,-- - ,M, n = 1,2,- - , A^^. These observations are drawn from a K Gaussian mixtures with 
mixture probabiUties ai, • • • , ax, 

K 

Vmn ~ X] «i • Sj) (1) 

where N{n,T,) denote the Gaussian density function with mean // and covariance E. Let 
z e {1, 2, • • • , K} denote the missing data where Gaussian y comes from. The probability that a 
particular observation comes from the i-th dimensional Gaussian is 

P{y\z ^i,e}^ (27r)-^/^|Er^/'exp{-^(y - ^i,fj:-\y - 
Our task is using the observations to estimate the unknown parameters 

d — {fJ'l, ■ ■ ■ , fJ'K, Si, • • • , Ex, Oil, ■ • ■ , cxk} 

that is, the mean and standard deviation of each Gaussian and the probability for each Gaussian being 
drawn for any given point. For the centralized scheme for the EM algorithm, observations are transmitted 
to the central fusion center for executing the standard EM with all data observations. 

The EM algorithm seeks to find the maximum likelihood estimation of the marginal likelihood 
by iteratively applying the E-step and M-step. In the E-step, we calculate the expectation of the 
log likelihood function, with respect to the conditional distribution of the unobserved data z given 
observations under the current estimate of the parameters 9*, 



Piymnl^mn — k, 0^)p{Zmn — k\9* 

k=l 



where 9^ means the estimation of parameters at the t-th iteration of the EM algorithm. Define the 
expected log-likelihood of the joint event: 



M Nn 



Q{9,9') = E,{\n[l[l[p{y^,„z\9)]\ymn,0'} 

m=l n=l 

M Nm 

= EzC^^\T).p{ymn,z\^)\ymn,9*} 
m=l n=l 

M Nm 

= ^^E^{^np{ymn,z\9)\ymn,9''} 

m=l n=l 

M N,n 

= ^ ^P{Zmn = i\ymn, 9^) In p{Zmn = hymn\9) 
m=l n=l 

M Nm 

= ^^P{Zmn = i\ymmO^)ln\p{y^ri\Zmn = hO)p{Zmn = ^l^*)] 



m=l n=l 
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In the M-step, we want to maximize the expected log-likelihood Q{6, 0*) and obtain the estimation of 
next step for 0, i.e., 

= argmaxQ(^,^*) 

6 

By solving this maximization problem, we can get the estimation of i-th component's parameters for 
the Gaussian mixtures at step t + 1: 

M Nm 

t+1 _ m=l n=l 



771=1 n=l 

y\t+l m=l n=l 

Pi^^mn ^\ymn: ^ ) 

m=l n=l 



iVi + ■ ■ ■ + A^A^ 

m=l n=l 



= i\ymn, 0^) (2) 



The centralized EM algorithm enables the calculation of the global solution using all the observation 
data from each sensor in the network. We can rewrite Equation (2) into "sensor form" by introducing 
membership function and local statistics as follows, 

• Membership Function 

^mn,i Pi^mn ^|l/mrn ^ ) (3) 

The membership function is the probability of which the observation data ymn belongs to i-th 
component of the Gaussian Mixtures. 

• Local Statistics 

Nm Nm Nm 

^in,i ~ ^ ] ^mn,ii Pm.i ~ ^ ] ^mn^iymn) 1m,i ~ ^ ] ^mn,iiymn ~~ l^i){ymn ~ f^i) (4) 
n=l n=l n=l 

The estimation of parameters at the t + 1 step in Equation (2) can be rewritten as follows: 

M M 

t+1 ^ m=l yt+1 ^ m=l t+1 ^ ^ C<^ 



M ' * M ' AAi -I- ■ ■ ■ -I- A^A 



m=l m=l 



M M M 

We denote = E ^Sl,/^^' = E /^S^^*""' = E 7^' the global statistics. Since the 

m=l m=l m=l 

Local Statistics j, 7m,i} can be calculated locally given the current estimated parameter set 9^ 

and the local observation data there are two distributed strategies for implementation of the EM 
algorithm in WSNs. 
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(1) Incremental Updating Strategy 

In this strategy, a cyclic sequential communication path between sensor nodes is pre-determined. 
In the forward path, the global statistics is updated by the pre-selected path for all sensor nodes 
according to 

= _L 5* . = + /5* . ^* = ^* + V . 

In the backward path, each node updates the parameter set 6**, which is the global solution at the 
M-step, according to Equation (5) in the reverse order. 

(2) Consensus Strategy 

The estimation of parameter set 6^ in Equation (5) is not changed at the M-step. If we redefine the 
global statistics {e\, [51, 7*} as the averages of the local statistics {e^ j, j, 7m,i}' then 



M M 

M 



M ^ ' m,i M ^ 'm,t ^ 



m=l m=l 

Therefore, the idea of the average consensus filter can be used after the computing local statistics 
at the E-step. Each sensor node can get the global statistics by exchanging the local statistics with 
its neighbors until reaching consensus all over the network. The global solution for each sensor 
node at the M-step can be obtained by Equation (6). 

For both strategies, individual sensor node first computes its own local statistics using the local 
observation data and estimation of the last step, and then gets the global statistics or global estimation 
for the parameters by changing information with its neighbors in some specific way. Therefore, each 
node can obtain the estimation of the parameters equivalent to the estimation in centralized way at every 
step, and get the same convergence point as the standard EM. 

3. Diffusion Scheme of EM Algorithm for Gaussian Mixtures in WSNs 

Both of the strategies described in Section 2 can be adopted to implement the EM algorithm in 
distributed way. However, as noted earlier in [15], there are several disadvantages for both methods. 
For the incremental strategy, the communication path should be determined before the implementation. 
Each sensor node should assign a subsequent node to form a circle containing all the sensor nodes in 
the network at first. The communication will then start from one node to its subsequent node and relay 
the information sequentially. At the last update, the global estimation for the parameter is generated in 
the reverse order. For a large scale sensor network, this strategy is untractable since the last node will 
wait a long time to receive the information from its previous node and then the global estimation for the 
parameter will need a long way to transmit to every node as well. This approach will fail when any node 
in the network is shut down or run out of energy. 

For the Consensus strategy, a consensus filter will be implemented after the E-step at each iteration, 
namely C-step (consensus step). In this step, each sensor node will exchange local information with 
its neighbors until the agreement is reached. R. Carli et al. pointed out that the time-invariant 
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communication networks with symmetries yield slow convergence to consensus, as the convergence 
rate will degrade as the number of sensor nodes increase, if the amount of information exchanged by the 
nodes does not scale well with their total number [27]. In a large scale WSN, the consensus of global 
statistics for all the sensor nodes requires quite a bit of communications to the neighbors and consume a 
lot of battery power. 

Consider a WSN in which each sensor node has local computation functions and can only 
communicate with sensors in its neighborhood. The network can be modeled as a graph Q{V, E) with 
the vertex set V = {1, ■ ■ ■ , n} and the edge set E = E V x V}. Each sensor node is a vertex 

of the graph. There is an edge between sensor nodes i and j if their distance is less than or equal to the 
radio range r, i.e., E = {(i, j)\dij < r}. We assume the graph is connected, i.e., there exists a path in E 
for any two vertices. The set of neighbors of node i is defined as the nodes which can communicate to 
nodei directly, i.e.. Mi = j) G E}. 

Power consumption of the sensor nodes in WSN can be divided into three domains: sensing, 
communication, and data processing [1]. Among them, usually a sensor node spends much more 
energy in data communication than sensing and data processing. In this paper, we also make such 
assumption. Therefore, an important aspect for performing tasks in a distributed fashion in WSN is 
to cap the communication cost. The communication cost of the topology Q{V, E) can be defined as a 
function of the adjacency elements by 



where sgn{-) is the sign function and aij is the nonnegative adjacency element of adjacency matrix A 
of topology Q{V, E) [13]. An adjacency matrix is a means of representing which vertices of a graph 
are adjacent to which other vertices. Specifically, the adjacency matrix of a finite graph Q{V., E) on n 
vertices is the n x n matrix where the non-diagonal entry is the number of edges from vertex i to 
vertex j. According to this definition, C is the same as I^EI for a digraph. 

By the definition of communication cost, the incremental strategy tends to require the least amount 
of communications. However, it inherently requires a Hamiltonian cycle through which signal estimates 
are sequentially circulated from sensor to sensor. In addition, the eventuality of a sensor failure also 
challenge the applicability of incremental schemes in large scale WSNs. The major disadvantage of the 
consensus-based algorithm is its high communication overhead requirements of each node. Motivated 
by these results and by diffusion strategies of [21-26], we propose a diffusion distributed EM algorithm 
with D-step (diffusion-step) between E-step and modified M-step. In such algorithm each sensor node 
exchanges local information with its neighboring nodes only once after the local statistics has been 
computed based on the local data at each iteration in EM algorithm. In the E-step, sensor nodes compute 
the local statistics by using local observation data and parameters estimated at the last iteration. In 
the D-step, each node exchanges local information only with its current neighbors. In the M-step, the 
sensor nodes communicate their local preestimates with their neighbors and perform average operations 
to obtain the estimation of parameters at this iteration. This DEM algorithm is summarized as follows. 



n 
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(1) E-step 

At time t, the nodes compute their local statistics with local observation data and the estimated 
parameter at the last step as follows, 



Nm. Nm N„ 



n=l n=l n=l 

(2) D-step 

Each node communicates its local statistics with its neighbors as follows. 



^rn,i I* /-I |»/-| A,i' 7m,i i * r i '^^^ 



(3) M-step 

Each node computes the pre-estimation and communicates it to its neighbors to perform the 
average according to 



t _ ^ Sr~^ t yt _ ^ Sr~^ yt t _ ^ ST^ /QN 

r'm,i \ KfA / j r'hii m,i I AT I / ^ "m,i I \f.\ / j l,i ^ 



where J\fi denote the current neighbor set of node m. 

In our proposed method, each sensor node needs to transmit the local statistics and preestimates 
to neighboring nodes at each iteration while the distributed EM with consensus strategy need much 
more amount of communication to achieve consensus, especially in large scale WSNs. Without loss 
of generality, we consider the one-dimensional Gaussian mixtures with K components. Each node 
has 3K scalars of local statistics {{Sm,i^ (^rn,i^lrn,i}^^ = 1- ■ ■ K) that need to be transmitted. In the 
incremental strategy, the communication complexity per node is ?)K = 0{K), since the local statistics 
only need to be transmitted to the next node. In the consensus strategy, the communication complexity 
per node is 0{\Afi\ ■ K'^), since the local statistics need to be exchanged between all of the neighbors 
until they achieve consensus [15,17]. The communication complexity per node for the diffusion strategy 
is \N'i\ ■ 3K = 0{\J\fi\ ■ K), which is much lower than the consensus strategy. 

4. Performance Analysis 

An important question is how well does the proposed DEM algorithm perform the estimation task. 
That is, how close does the local estimate get to the desired parameter as time evolves. Analyzing 
the performance of such diffusion scheme is challenging since each node is influenced by the local 
statistics from the neighboring nodes. In this section, we analyze the performance of the proposed DEM 
algorithm and show that it can be considered as a stochastic approximation method to find the maximum 
likelihood estimator. That is, if an infinite number of data, which are drawn independently according to 
the probability distribution p{y), are available for each sensor, the DEM algorithm can be considered as a 
stochastic approximation for obtaining the maximum likelihood estimation of the Gaussian mixtures for 
all the sensor nodes in the networks. Under some rough conditions, the maximum likelihood estimator is 
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consistent, which means that as the number of data goes to infinity the estimator converges in probability 
to its true value. The main result of this section will be concluded in Theorem 1 at the end of this section. 

First, we introduce a symbol below to denotes a weighted mean with respect to the posterior 
probability that proposed in Equation (3) as a membership function for each observation data, 

1 ^ 

{f{y))^{N) = J^Y. fiyn)pi^\yn, 0) (10) 
n=l 

where i denotes the i-th component of Gaussian Mixtures and N denotes the number of all data. Under 
this notation, the M-step in Equation (2) can be written as 

i^\(An {y)A!^l T {{y-^^^){y-^^i)')^{N) 

a. = {lUNl = = jj^^ (11) 

If an infinite number of data are drawn independently according to the unknown probability 
distribution density p{y), the weighted mean converges to the following expectation value, 

(/(y)),(iV) ^ E[f{yM^\yt, 0)]p, N^oo (12) 

where E[-]p denotes the expectation with respect to the probability distribution density p{y). In this case, 
the M-step at the t-th iteration can be written as 



a 



t+i 



E[p{t\y,e%, p. 



t+i 



E[ypit\y,9% 
EW\y,9% 



i+l 



E[iy - fii)iy - fiiypii\y,9*)]f 



EW\y,e% 

At an equilibrium point of this EM algorithm, the estimated parameter at the last step and current 
step becomes the same: 9^ = 9^^^. The equilibrium condition of this EM algorithm is equivalent to the 
maximum likelihood condition, 

dL{9)/d9 = 0 (13) 
where the log-likelihood function L{9) is defined as: 



L{9) = E 



K 



log ^aiP{y\9) 



(14) 



Since we consider the asymptotical property of the DEM algorithm, one more data can be taken 
at each iteration. Therefore, the local standard EM algorithm for each sensor node is a modified 
on-line version of the EM algorithm [28]. In the on-hne EM algorithm, the weighted mean defined in 
Equation (10) is replaced by 



N / N \ 

{{fm^{N) = viN)Y,[ n m) ^ fiyn)p{^\yn,9n-l] 

n=l \r?i=?i+l / 
/ N N 

^(^) = E n 



-1 



(15) 



^n=l m=n+l 



Sensors 2011, 11 



6306 



where the parameter A(m),0 < A(s) < 1 is a time dependent forgetting factor, which is introduced 
for forgetting the effect of the former observation to the current estimator. The parameter ri{n) is a 
normalization coefficient. 

The modified weighted mean which is proposed in Equation (15), can be written as the 

step- wise version as follows: 

{{f{y))W) = {{m)Ut-i) + vmf{yM{t)-{{mMt-i)} 

Vit) = (l + A(t)Mt-l))-i (16) 

where pi{t) = p{i\yt,9^~^), and 9^ denotes the estimation of parameter after t iterations with 
t observations. 

After computing the modified weighted means according to Equation (16), the current estimation of 
parameter 6*" can be written as: 

= = mwr = — wm — ^''^ 

In the basic on-line EM algorithm, for the t-th observation yt, the weighted means are calculated by 
using the step- wise Equation (16), and the current estimation of parameter is computed according to 
Equation (17). It can be shown that the on-line EM algorithm is equivalent to the standard batch EM 
algorithm if a specific assignment of the forgetting factor is employed [28]. 

For the WSN case, the weighted means are actually the local statistics, which are denoted as 

= {{yMt), {{{y - /i.)(y - 

where m denotes the m-th sensor node and i denotes the i-th component of the Gaussian Mixtures. The 
local on-line EM algorithm for the m-th sensor node can be written in an abstract form according to 
Equations (16) and (17): 



v{t)[F{y,e' 

el = H{^l) (18) 

where 6^ denotes the estimation of the parameters for m-th node at time t, and 0^ = {(/>m,i! " " " ? (f'ln^K} 
denotes the local statistics for all components. 

A distributed dynamic system consists of M dynamic elements over an information networks can be 
described in the following: 



m 



m = l,--- ,M 

leAfm 

where Mm denotes the set of neighboring nodes for node m in the network. This system can be written 
as a matrix form 

i* = -Lz^ (19) 
where the L is the graph Laplacian, and we refer to Equation (19) as a first-order Laplacian dynamics. 
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The D-step of DEM algorithm, which is proposed in the previous section, can be formulated by the 
following Laplacian dynamics in the discrete form 



m Vm 



Therefore, the abstract form of standard on-line EM algorithm proposed in Equation (18) can be 
replaced by 

oL = ml) (21) 

From Equation (12), it can be easily proved that the following equations hold 

0^ = E[F{y,e)], 

9^ = H{(t)m) (22) 

which are equivalent to the maximum likelihood condition in Equation (13). Thus, the diffusion on-line 
EM algorithm can be written as, 

54>l = vmE[F{y, H{4>'-'))]p - 4>'-' + mavu 0*-') (23) 
where = {4>\^^, I G ■N'^^}, and the stochastic noise term C, is defined by 

0*-^) = F{y,, H{<j>'~')) - E[F{y, H{<j>'^'))], + i^t^'' " 0^^"') (24) 

Equation (23) has the same form as the Robbins-Monro stochastic approximation [29], which finds 
the maximum likelihood estimation given by Equation (22), as well as Equation (13). Therefore, 
the estimation of the parameter for each sensor node reaches the same maximum likelihood point 
asymptotically with the diffusion strategy for local statistics embedded in the standard EM algorithm. 

For the convergence proof of the stochastic approximation in Equation (23), several conditions about 
the stochastic noise and the coefficients should be considered. It is easy to verify that the expectation of 
the stochastic noise satisfies, 

^[C(l/,0m)]=O (25) 

and the noise variance V"ar[(^]p is given by. 



Var[((y„ < 2Var [F{y, H(.#.,„))l„ + 2Var 



(26) 



Both terms on the right side of Equation (26) are bounded if we assume the data distribution density 
p{y) has a compact support. 

The normalized coefficient r]{t) should satisfy the conditions, 

oo oo 

7]{t) ^ 0, t ^ OO, ^(^) = 5Z W < °° (27) 

t=l t=l 
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Recalling the definition of the coefficients \{t) and rjit) in Equations (15) and (16), we have 
the constraints, 

0 < X{t) < 1, 1/t < ri{t) < 1 (28) 
Therefore, we define the r]{t) as, 

V{t) = ^, 0 < a < 1 (29) 
which satisfies the constraints in Equations (27) and (28). 

Theorem 1 If an infinite number of data, which are drawn independently according to the Gaussian 
mixture model (1), are available, the proposed DEM algorithm (7), (8), (9) can be considered as a 
stochastic approximation for obtaining the maximum likelihood estimator of the desired parameters. 

5. Simulations 



In this section, we will present simulation results to illustrate the effectiveness of our proposed 
algorithm. We randomly generate n = 100 sensor nodes in a 10 x 10 square, each sensor node takes 100 
observation data. Two nodes will be connected with an edge while the distance between them is less than 
the communication range r of each node. Figure 1 shows the cases of different communication range 
with r = 0, r = 1.5, r = 2.0 and r = 2.5. Since the incremental strategy EM algorithm is intractable for 
medium and large scale WSNs, we only investigate the local, diffusion and consensus EM algorithms 
in simulations. 

Figure 1. The connected network of 100 randomly distributed sensors with different 
radio ranges. 




(c) Radio Range=2.0 (d) Radio Range=2.5 
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5. 1. DEM Algorithm with 1 -Dimensional Data 

First, we consider the 1 -dimensional Gaussian mixtures with two components. Each of the Gaussian 
components is with mean 6 and 15, variance 1 and 9, respectively. In the first 50 nodes (node 1 
to node 50), 80% observations come from the first Gaussian component, and the rest 20% are from 
the second component. In the last 50 (node 51 to node 100), 40% observations come from the first 
component while the rest 60% are from the second one. The communication range is set to be 1.5 in 
this simulation. 

The performance of estimation for each sensor node is shown in Figure 2. For comparison, the EM 
algorithm with and without information exchanging (local standard EM) are tested. The estimates of 
both mean and variance are significantly different for each sensor node with standard EM algorithm only 
based on the local data, while the estimation of both mean and variance with diffusion distributed EM 
algorithm are almost the same for each sensor node. 

Figure 2. Estimated mean and covariance with and without diffusion. 
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1 WS/,/VA7yAv\AAY^VvA/V^^ 
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The estimation performance for local, diffusion and consensus EM algorithms is demonstrated in 
Figure 3. The average estimation error (AVE) is defined as 



AVE 



n 

n ^-^ 



Q\ 



(30) 



where n denotes the number of sensor nodes, Q denotes the desired parameter and Qi denotes the 
parameter estimate for the i-th node. From the figure, we can find that the local EM has the worst 
estimation performance. The diffusion EM is slightly worse than the consensus EM algorithm but with 
much less communication overhead, as depicted in Figure 4. This will lead to conserve energy and 
prolong the system lifetime. The values of log-likelihood function 



M N„ 
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(31) 
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are also shown in Figure 5. The likelihood for diffusion EM converges to almost the same value as the 
consensus EM. 

Figure 3. Performance comparison for different EM algorithms. 
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Figure 4. Communication overhead for diffusion EM and consensus EM. 
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Figure 5. Likelihood function for different EM algorithms. 



X 10 




lil<eiitiood 



-a — iocal EM 
-G — diffusion EM 
— & — consensus ElVl 



20 



25 30 
iteration steps 



35 



40 



45 



50 



Sensors 2011, 11 



6311 



The scalability of the proposed diffusion EM algorithm is also investigated, with n = 1, 000 sensor 
nodes randomly generated in the same square. Each sensor node still takes 100 observation data and the 
radio range is set to 0.2. The estimated mean and variance values for all nodes are depicted in Figure 6 
for both local EM and diffusion EM. The estimation performance for local, diffusion and consensus EM 
are demonstrated in Figure 7. we can still find that the local EM has the worst estimation performance 
while diffusion EM is slightly worse than the consensus EM algorithm. 

Figure 6. Estimated mean and covariance with and without diffusion. 
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Figure 7. Performance comparison for different EM algorithms with 1,000 sensor nodes. 
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5.2. DEM Algorithm with 2 -Dimensional Data 



In this subsection, we consider the 2D Gaussian density with n = 100 sensor nodes randomly 
generated in the same square. Each sensor node still takes 100 observation data. The observations are 
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generated from 2D Gaussian mixtures with three components distributed in Figure 8. The observations 
for each sensor node are collected as follows. In the first 30 nodes (node 1 to node 30), 90% observations 
come from the first Gaussian component while the rest 10% observations come evenly from the other 
two Gaussian components. In the next 40 nodes (node 31 to node 70), 80% observations come from the 
second Gaussian component while the rest 20% observations come evenly from the other two Gaussian 
components. In the last 30 nodes (node 71 to node 100), 90% observations come from the third Gaussian 
component while the rest 10% observations come evenly from the other two Gaussian components. 

Figure 8. Data distribution with 3-components Gaussian mixtures. 






The first estimated mean values during the iteration process for ten random selected sensor nodes are 
demonstrated in Figure 9. The radio range is set to 2 for each sensor. The estimated mean values almost 
converge to the same value for diffusion EM while the estimated mean values converge to different 
values for local EM algorithm. 

Figure 9. Estimated mean values by EM with and without diffusion. 
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We also investigated the impact of estimation performance and communication overhead with 
different communication ranges. Since local processing/computing is much less expensive than 
communication, we emphasize on the tradeoff between estimation performance and communication 
overhead. From Figure 10 we can see that increasing the communication range for each node can notably 
improve the estimation performance but also significantly increase the communication overhead. 
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Different communication ranges for each node are also considered as well as the different exchange 
steps at each iteration for each node. In Figure 11, we show the performance of first estimated mean 
value for each sensor node with different communication ranges. Black line with diamond denotes the 
estimated mean value of the first component. Red line with circle denotes the estimated mean value of 
the second component. Blue line with triangle denotes the estimated mean value of the third component. 
The second estimated mean value has a similar result and is ignored here for simplicity. Without 
communication between sensor nodes, each estimated mean value has a relative accurate section, which 
represents reliable estimates of those nodes whose most observations come from the corresponding 
Gaussian component. Other nodes cannot properly estimate the parameters due to hmited observations 
used. With diffusion between sensor nodes, the estimated mean values in all nodes approximate to their 
true values (the true values are 0.2, 0.4, and 0.7). With the increasing of the communication ranges, the 
estimation performance performance has been improved. 

Figure 10. Estimation performance and communication overhead versus radio range. 
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Figure 11. Estimated mean values versus radio range. 
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In addition, we consider the fixed communication range of each node with different steps of 
information exchange in the D-step of each iteration. From Figure 12, we can see that the estimation 
performance has been improved with increased exchange steps. These two simulations also show the 
trade-off between the estimation performance and communication overhead of sensor nodes. 

Figure 12. Estimated mean values versus steps of information exchange. 




40 60 
Nodes 



6. Conclusions 

This paper has presented a diffusion scheme of EM algorithm for distributed estimation of Gaussian 
mixtures in WSNs. In the E-step of this method, sensor nodes compute the local statistics by using local 
observation data and parameters estimated at the last iteration. A diffusion step is implemented over 
the communication networks after the E-step. In this step, the communication network is modeled as 
a connected graph, and each node exchanges local information only with its neighbors. In the M-step, 
the sensor nodes compute the estimation of parameter using the updated local statistics by the D-step 
at this iteration. Compared with the existing distributed EM algorithms, our proposed method can 
extensively reduce communication overhead for each sensor node while maintaining the estimation 
performance. In addition, we have shown that the proposed DEM algorithm can be considered as a 
stochastic approximation method to find the maximum likelihood estimation for Gaussian mixtures. 
Simulation results show good performance of our method. 
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